rr^  

I ' AD-A061  725  FEDERAL  AVIATION  ADMINISTRATION  OKLAHOMA  CITY  OKLA  CI—ETC  F/G  5/10 

THE  MEASUREMENT  AND  SCALING  OF  MORKLOAD  IN  COMPLEX  PERFORMANCE* <U> 
SEP  76  W D CHILES*  A E JENNINGS,  E A ALLUISI 

FAA-AM-78-34  NL 


UNCLASSIFIED 

!.!  | 

725  ■ 


DOC  FILE  COpV  AD  AO  61  725 


FAA-AM-78-34 


THE  MEASUREMENT  AND  SCALING  OF  WORKLOAD  IN  COMPLEX  PERFORMANCE 


W.  Dean  Chiles 
Alan  E.  Jennings 
Earl  A.  Alluisi* 

FAA  Civil  Aeromedical  Institute 
P.O.  Box  25082 

Oklahoma  City,  Oklahoma  73125 

^Performance  Assessment  Laboratory 
Old  Dominion  University 
Norfolk,  Virginia  23508 


V 


Document  is  available  to  the  public  through  the 
National  Technical  Information  Service, 
Springfield,  Virginia  22161 


Prepared  for 

U.S.  DEPARTMENT  OF  TRANSPORTATION 
Federal  Aviation  Administration 
Office  of  Aviation  Medicine 
Washington,  D.C.  20591 


NOTICE 


IHl«  document  ia  diaeeminated  under  the  aponaorehip  of  the 
Deportment  of  Traneportation  in  the  intereat  of  information 
exchange . The  United  Stetea  Government  aaaumaa  no  liability 
for  ita  content  or  uee  thereof. 


r 


Technical  Keport  Documentation  Page 


'f AA- AM-78- 34* 

2 Government  Accession  No. 

I 4.  Title  ond  Subtitle  / 1 

XHE  MEASUREMENT  AND  SCALING  OF  WORKLOAD 
IN  COMPLEX  PERFORMANCE 


fit 


\ 


J. 

. W.  Dean  |Chi  les  , Alan  E.  |Jennings  , and  Earl  A.  | Alluisi 

f a a— i L n I . . 1 


0.  Pirformilg  Ond 

FA A Civil  Aeromedical  Institute 
P.0.  Box  25082 

Oklahoma  City,  Oklahoma  73125 


12.  Sponsoring  Agency  Nome  ond  Add'ess 

Office  of  Aviation  Medicine 
Federal  Aviation  Administration 
800  Independence  Avenue,  S.W. 
Washington,  D.C.  20591 


3.  Recipient  s Cotolog  No. 


jfteyoit  Pqty  — — ■ ■ — — I 

/SEPTEMBER  1978 


Pei  lores  IAf  Qtgnn,  tnl.  on  £ p d J 


I 


8.  Performing  Orgom  zohon  Report  No. 


10 

Wo rh  Unit  No 

(TRAIS) 

11 

Contract  or  G 

ant  No. 

13  Type  of  Report  ond  Period  Covered 


14  Sponsoring  Agency  Code 


15  Supplementary  Notes 


Work  was  performed  under  Tasks  AM-D-77/78-PSY-57  j^J  ! j-  ' 


*16  Abstrocf 

Two  groups  of  young  men  (Group  I,  N = 51,  tested  identically  on  2 successive  days; 
Group  II,  N = 43,  tested  on  l day  only)  performed  various  combinations  of  the  six 
tasks  of  the  CAMI  Multiple  Task  Performance  Battery.  Two  of  the  tasks  involved  the 
monitoring  of  static  (lights)  and  dynamic  (meters)  processes;  the  four  more-active 
tasks  involved  mental  arithmetic,  elementary  problem  solving,  pattern  identification, 
and  two-dimensional  compensatory  tracking.  Five  of  nine  performance  intervals 
provided  different  complex  tasks  consisting  of  both  of  the  monitoring  tasks  and  two 
of  the  active  tasks  presented  concurrently.  Other  trials  provided  data  on  the  singly) 
performed  constituent  tasks  as  well  as  the  combined  monitoring  tasks.  Results 
indicated  that  all  12  performance  measures  varied  significantly  as  a function  of  the 
different  task-combination  conditions.  A standard  psychological  scaling  technique 
(Thurstone  Case  V)  was  applied  to  the  monitoring  data  (for  the  green  and  red  lights 
and  for  the  meters)  to  develop  an  index  of  workload  for  the  five  complex  task  combi- 
nations. Since  better  performance  was  presumed  to  indicate  a lower  workload, 
workload  was  inferred  to  increase  as  performance  declined  across  conditions.  The 
best  performances  (scale  values  of  zero)  were  associated  with  single  tasks  as 
expected.  Scale  values  for  the  complex  task-combination  conditions  were  consistent 
between  groups  and  between  the  2 days  of  testing  of  Group  I (r’s  of  .947  to  .993). 
Although  the  scale  values  are  specific  to  the  tasks  and  task-combinat ion  conditions 
employed,  the  scaling-procedure  application  shows  promise  for  cases  in  which 
quantitative  measures  of  performance  can  be  acquired  with  moderately  large  fN  > 50) 
samples  of  subjects,  Ct , c -f 


17.  K«y  Words 

Scaling 
Work  load 

Complex  Performance 


18.  Distribution  Stotement 

Document  is  available  to  the  public  through 
the  National  Technical  Information  Service, 
Springfield,  Virginia  22161 


19.  Security  Clossif.  (of  this  report) 

20.  Security  Clossif.  (of  this  poge) 

21*  No.  of  Poges 

22.  Price 

Unclassified 

Unclassified 

12 

Form  DOT  F 1700.7  (8-72) 


Reproduction  of  completed  page  authorised 

— A _ 

on  oso 


— j 


THE  MEASUREMENT  AND  SCALING  OF  WORKLOAD  IN  COMPLEX  PERFORMANCE 


I . Introduction. 

The  level  of  performance  that  can  be  expected  of  a properly  trained  and 
selected  human  operator  is  of  great  interest  in  many  decisions  involving  the 
design  of  man-machine  systems.  In  many  transportation  areas,  the  level  of 
operator  performance  has  direct  implications  for  safety.  The  level  of 
performance  has  generally  been  held  to  be  a function  of  three  broad  categories 
of  influences:  personnel  factors,  situational  factors,  and  job  demands.  Job 
demands,  i.e.  workload,  can  be  considered  to  be  related  to  the  number  and 
variety  of  skills  that  the  operator  must  exercise  in  performing  his  job  and  to 
the  nature  of  the  specific  skills  involved.  Workload  has  generally  been 
considered  to  be  an  important  modifier  of  performance  under  a variety  of 
personnel  and  situational  conditions,  and  the  assumption  that  workload  can  be 
measured  as  a unitary  concept  is  central  to  many  decisions  affecting  the 
design  of  man-machine  systems.  Thus,  development  of  a methodology  that 
yields  reliable  and  valid  indices  of  operator  workload  should  lead  to 
important  gains  in  safety  and  productivity  through  resultant  modifications  of 
systems  designs  and  operating  procedures. 

There  have  been  two  general  approaches  to  the  definition  and  measurement 
of  workload.  One  approach  has  focused  on  energy  expenditure,  or  effort,  as 
the  central  concept.  This  approach  has  generally  attempted  to  measure 
workload  by  using  indices  based  on  biomedical  measures  or  on  subjective 
reports  of  effort.  The  other  approach  has  focused  on  the  measurement  of 
performance  or  of  system  output  under  the  assumption  that  operators  under  an 
increased  workload  will  not  be  able  to  perform  as  well. 

One  method  of  studying  workload  under  the  latter  approach  has  been 
through  the  use  of  secondary  or  loading  tasks.  Knowles  (8)  summarized  earlv 
work  of  this  sort  and  provided  the  general  rationale  for  the  application  of 
the  technique  to  workload  measurement.  The  basic  approach  in  this  method  is 
to  compare  the  level  of  performance  achieved  on  the  secondary  task,  when 
performed  alone,  with  the  level  achieved  at  that  task  when  it  is  performed  in 
combination  with  the  primary  task.  Kelley  and  Wargo  (7)  noted  that,  in  most 
tasks,  operators  will  tend  to  form  their  own  criterion  as  to  how  well  a given 
task  should  be  performed  and  then  vary  their  expenditure  of  effort  up  or  down 
to  meet  it.  Once  this  criterion  is  established,  performance  measures  on  that 
task  become  relatively  invariant,  since  insofar  as  possible,  operators  will 
adjust  their  level  of  effort  to  maintain  the  criterion  level  of  performance. 
Therefore,  the  secondary  task  is  added  to  the  primary  task  in  order  to  increase 
the  subject's  workload  and  to  obtain  an  index  of  the  amount  of  spare  time  that 
the  subject  has  while  performing  the  primary  task. 
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The  present  st  udy  is  aimed  at  developing  an  objective  method  tor  scaling 
di  t toront  levels  ot  workload.  'Hu*  Civil  Aoromodi  cal  Inst  i t nt  o 1 s Multiple 
Task  ri'rlininaiuv  Hattorv  (MTPll)  was  used  to  provide  several  tasks  (mon  i t or  i ng 
lights,  monitoring  motors,  two-dimons  ional  oomponsatory  tracking,  pat  torn 
i dont i t i oat i on , montal  arithmetic,  and  problom  solving)  in  difforont  combi  - 
nations  to  gonorato  varying  job  domands  and  prosnmablv  varying  lovols  ol 
workload.  Provions  studios  ( .' , l, A)  with  t ho  MTPlt  havo  domonstratod  that  , 
undor  various  sorts  ot  stross,  snbjoots  will  t ond  to  protoot  thoir  porform- 
anoo  on  tho  four  moro-aotivo  tasks  ( i . o . , tho  tasks  that  require  aotion  moro 
often  on  tho  part  of  t ho  snbjoots;  viz,  pattorn  i dont  i f i oat  i on  , montal 
arithmotio,  prohlom  solving,  and  tracking),  whilo  allowing  porformanoo  on  tho 
monitoring  tasks  to  bo  nogativolv  afl'ootod.  Thus,  thoro  is  a strong  tondonoy 
tor  snbjoots  boing  tostod  on  tho  MTPlt  to  front  tho  MTPlt  tasks  as  a primary/ 
sooondary  task  combinat ion  ovott  though  thoy  aro  ins true tod  to  t roat  all  tasks 
as  equally  important.  In  tho  prosont  study,  tho  monitoring  tasks  aro 
oonsidorod  to  bo  sooondarv  tasks,  whili'  tho  four  moro-aotivo  tasks  ot  tho  MTPlt 
aro  oonsidorod  tho  primary  tasks. 

II.  Mot  hod. 


Subjpot  s . A total  of  A A snbjoots  woro  tostod.  All  wore  paid  malo 
voluntoors  botwoon  tho  agos  ot  IS  and  Tho  snbjoots  woro  dividod  into  two 

groups:  Croup  1 C'l  snbjoots)  was  tostod  on  .’  days,  with  t ho  samo  tosting 

schedule  for  both  days,  and  Croup  11  (At  snbjoots)  was  tostod  on  1 day  only. 

Po  r t ormanoo  Itattorv.  Tho  MTPlt  oonsists  ot  six  tasks  that  oan  be 
prosontod  singly  or  in  any  oontbi  nation.  It  is  computerized  so  that  all 
signals,  probloms,  and  display  ohangos  aro  prosontod  undor  program  oontrol, 
and  all  sooring  ot  t imos  and  rosponsos  aro  ston'd  for  off-lino  analysis.  Tho 
sitme  probloms  or  intorsignal  intorvals  woro  prosontod  to  all  snbjoots  in 
oorrosponding  tosting  sessions.  Uriel'  desoriptions  ot  tho  nature  and 
porformanoo  domands  of  tho  tasks  follow. 

1.  Rod  and  green  lights  monitoring  (I  T).  Pairs  ot  integral  lights/ 
swi  folios  art'  looatod  at  oaoh  oornor  and  in  tho  oontor  of  tho  subject  panels. 
The  uppor  light/switoh  of  oaoh  pair  is  rod  and  tho  lower  is  green.  Normally, 
tin'  red  lights  aro  off  and  tho  green  lights  aro  on.  A signal  on  this  task 
oonsists  of  a change  of  state  of  one  ot  tho  It'  lights,  and  a response  is  made 
by  pushing  tho  appropriate  switeh;  this  aot ion  returns  tho  light  t o its 
normal  state.  Signals  are  introduood  at  random  intervals,  averaging  one 
signal  per  minute;  response  time  is  recorded  separately  for  tho  rod  and  green 
lights.  Signals  that  aro  not  responded  t o art'  removed  after  I1'  s,  anil  tho 
response  t imo  on  that  signal  is  scored  as  IS  s. 

. Motor  mon  i t or  i ng  (MTR).  Hie  display  for  this  task  oonsists  ol 
tour  edge- read i ng  motors  mounted  near  the  top  of  tho  subject  panel  with  two 
pushbutton  switches  mounted  beneath  oaoh  motor.  Normally,  the  motor  pointers 
are  mov i ng  at  random  around  an  average  position  of  zero  (oontor).  When  a 


signal  is  introduced,  the  pointer  movement  continues  as  before,  except  that 
the  average  position  of  the  pointer  is  shifted  either  to  the  left  or  right  by 
an  amount  that  is  approximately  equal  to  the  maximum  excursion  resulting  from 
the  random  movement.  The  subject  responds  to  a signal  on  a given  meter  by 
pressing  the  button  under  that  meter  on  the  same  side  as  the  signal.  When  any 
meter  response  button  is  depressed,  the  random  movement  is  removed  and  the 
pointer  of  that  meter  stops  on  its  average  value,  thus  giving  immediate 
feedback  as  to  the  accuracy  of  the  response.  Signals  are  presented  at  an 
average  rate  of  one  per  minute,  and  a given  signal  remains  until  it  is 
responded  to  or  until  it  is  replaced  by  the  next  signal.  Response  time  is 
computed  as  the  average  time  the  signal  is  present  on  the  meters;  i.e.,  if  a 
subject  does  not  respond  to  a given  signal,  the  time  that  signal  is  present  is 
included  in  the  response  time  for  the  succeeding  signal. 

3.  Mental  arithmetic  (MATH).  The  problems  for  this  task  are 
presented  on  a Burroughs  self-scan  display  mounted  at  the  bottom  center  of  the 
subject  panel.  All  of  the  problems  are  of  the  form,  A + B - C = ?,  and  are 
made  up  of  numbers  from  11  to  99.  The  subjects  respond  by  entering  their 
answers  on  a reverse-order  serial  entry  keyboard;  it  requires  that  the  least 
significant  digit  be  entered  first.  The  answer  is  displayed  on  the  screen  as 
it  is  entered  from  the  keyboard  and  may  be  cleared  and  changed  by  the  subject. 
When  the  subject  has  entered  what  he  considers  to  be  the  correct  answer,  lie 
depresses  the  "complete”  button.  At  that  time  the  problem  and  answer  are 
removed  from  the  screen  and  the  subject  is  given  feedback  as  to  the  accuracy 
of  his  answer  ("R"  for  right,  "W"  for  wrong).  New  problems  are  presented  at 
20-s  intervals.  Response  time  is  scored  from  the  introduction  of  a problem  to 
the  time  when  the  subject  presses  the  "complete"  button.  Accuracy  is  computed 
as  the  proportion  of  correct  answers  to  total  problems  presented. 

4.  Pattern  identification  (PIP).  The  upper  left  portion  of  the 
Burroughs  display  is  used  to  present  the  six-column  by  six-row  patterns  in  thi 
task.  All  patterns  are  in  the  form  of  vertical  bargraphs  with  each  column 
height  from  one  through  six  appearing  just  once.  The  problems  on  this  task 
are  analogous  to  questions  on  a multiple-choice  examination.  The  first 
pattern  for  a given  problem  is  the  standard  or  "question"  pattern.  This 
pattern  is  followed  by  two  comparison  patterns  presented  in  succession.  The 
subject  must  decide  if  one,  neither,  or  both  of  the  comparison  patterns  were 
the  same  as  the  standard.  Answers  are  indicated  by  depressing  one  of  three 
buttons.  On  entering  an  answer,  feedback  is  provided  by  displaying  the 
correct  answer  on  the  screen.  The  standard  pattern  appears  for  5 s,  and  each 
comparison  pattern  appears  for  2 s with  1 s between  patterns.  New  problems 
are  presented  every  30  s.  Speed  of  response  (from  the  onset  of  the  second 
comparison  pattern)  and  accuracy  (proportion  of  correct  responses  to  total 
problems  presented)  are  recorded. 

3.  Problem  solving  (PS).  Each  subject  panel  is  equipped  with  five 
pushbutton  switches,  a white  "task  active"  light,  and  three  "feedback"  lights. 
The  task  requires  the  subject  to  discover  the  correct  sequence  in  which  to 
press  the  five  buttons.  Each  button  appears  only  once  in  a given  solution. 
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Anv  time  a button  is  pressed,  the  amber  light  is  illuminated  to  show  that  the 
response  has  been  acknowledged  by  the  system.  A red  light  provides  error 
feedback  and  is  illuminated  whenever  the  subject  makes  a response  that  is  not 
on  the  correct  solution  sequence,  and  turned  off  when  the  response  is  on  tin* 
correct  sequence.  Once  the  subject  has  pushed  all  five  buttons  in  the  correct 
order,  a blue  light  is  illuminated  for  20  s,  indicating  that  the  problem  has 
been  solved.  The  next  problem  is  then  presented.  Kach  solution  is  presented 
twice  in  succession,  and  the  subject  is  expected  to  reenter  the  previous 
solution  from  memory  on  the  second,  or  confirmation,  presentation.  Several 
measures  are  derived  for  this  task:  (a)  the  speed  of  solution  of  the  first 
presentation  of  a problem;  (b)  the  speed  of  reentering  the  solution  in  the 
confirmation  phase,  (c)  the  proport  ion  of  redundant  responses  made  during  the 
solution  phase  (responses  made  when  information  already  acquired  should  make 
the  subject  aware  that  the  response  being  made  is  not  correct  1 , and  (d) 
proportion  of  error  responses  made  on  the  confirmation  entry  of  the  solution. 


b.  Tracking  (TRRl.  The  display  for  the  tracking  task  is  a 
cathode-ray  tube  (CRT)  mounted  on  the  upper-center  of  the  subject  panel.  The 
target  is  a dot  of  light  on  the  CRT,  and  the  center  of  the  CRT  is  defined  by 
horizontal  and  vertical  crosshairs  on  the  screen.  The  subject's  task  is  to 
ust‘  a control  stick  to  attempt  to  counteract  a random  forcing  function  and 
keep  the  target  as  near  the  center  of  the  screen  as  possible.  The  forcing 
function  changes  the  direction  of  target  movement  every  1 s.  Terformance  of 
the  tracking  task  is  scored  bv  analog  circuitry  that  integrates  absolute  error 
and  error  squared  for  each  dimension.  The  error-squared  measure  is  converted 
to  RMS  ( root -mean- square ) error,  and  vector  RMS  and  vector  absolute  error 
measures  are  derived  from  horizontal  and  vertical  error  scores.  Since  these 
measures  are  highly  i nt ercor re  1 at ed , vector  RMS  error  is  used  as  a single 
index  ot  tracking  performance. 


Procedure . All  subjects  were  trained  for  l hour  and  then  tested  in  a 
2-hour  session  in  which  the  six  MTPR  tasks  were  presented  individually  for  IS 
minutes  each  and  the  two  monitoring  tasks  were  presented  together  for  two 
1 ‘'-minute  intervals  (l.T/MTR-1  and  LT/MTR-2).  Following  1 or  more  hours  of 
rest,  the  subjects  were  tested  for  2\-  hours  on  five  complex  tasks  (30  minutes 
on  each):  (a)  PS/TRK,  (b)  MATH/PS,  (c)  P1P/TRK,  (d)  Pltl/PS,  and  (e)  MATH/TRK. 
All  t i ve  complex-task  combin.it  ions  also  included  t he  light-monitoring  and 
met er-moni t or ing  tasks.  The  same  testing  schedule  was  repeated  for  Croup  1 on 
the  f o 1 1 ow i tig  day  (Croup  II  was  measured  only  on  the  first  day).  The  1-hour 
training  period  was  considered  sufficient  for  purposes  of  this  study,  but 
considerably  longer  periods  are  generally  required  for  subjects  to  reach 
stable  performance  on  the  complex  tasks  (I).  Consequently,  it  was  anticipated 
that  scores  for  Croup  1 would  show  some  improvement  on  the  second  day  of 
test ing. 


111.  Results  and  hi  sens  s i on . 


The  differences  between  the  2 days  of  Croup  1's  performance  were  tested 
with  an  analvsis  of  variance  computed  for  each  of  the  12  measures  of 
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per  formance . Statistically  significant  Vp  ■-  .05)  improvements  of  Pav  1 s 
per formances  over  Pav  1 wore  found,  as  expected,  with  several  measures;  I’S 
speed  in  all  conditions,  I’S  accuracy  in  the  I’S/TKK  condition,  MATH  speed  and 
accuracy  in  the  MATH/PS  condition,  and  green  lights  in  the  MATH/l’S  and  PIP/ l’S 
conditions.  In  no  case  was  Pay  ' s performance  worse  than  that  of  Pav  1. 

The  means  ot  the  performances  obtained  on  Pav  1 with  Croup  1 (upper 
figure  in  each  pair)  and  with  Croup  11  (lower  figure)  are  shown  in  Table  1 
tor  the  nine  intervals  of  performance.  The  specific  task  combination  employed 
in  any  given  interval  is  indicated  by  the  presence  of  scores  for  that  task  in 
the  table. 

An  analysis  of  variance  was  computed  with  each  ot  the  IT  t ask-performance 
measures  t o test  for  differences  between  the  two  groups.  Croup  1 was 
significantly  Ip  v .05)  better  than  Croup  11  on  PS  (accuracy),  measure  ' in 
the  table,  under  all  four  of  the  intervals  in  which  that  task  was  included. 

NT'  other  statistically  significant  difference  was  found  between  the  two 
groups . 

The  reliability  of  each  of  the  IT  measures  was  computed  with  the  data  of 
tin'  two  groups  separately  (Pav  1 only  for  Croup  l).  The  results  are  given  in 
Table  T.  In  each  case,  the  coefficient  of  correlation  for  a given  measure  is 
the  intraclass  correlation  of  that  measure  from  all  the  task  combinat  ions 
(intervals)  in  which  it  appeared.  Thus,  the  intraclass  correlation  coefficient 
reported  is  equal  to  the  mean  of  all  intorcorro 1 at  ions  among  the  measures. 

Vach  of  the  24  reliability  coefficients  given  in  Table  T is  statistically 
significant  fp  .05).  With  the  exception  of  MTK  (response  time),  measure 
with  Croup  1 where  the  reliability  was  . T 1 , the  coefficients  ranged 
between  .AT  (measure  !iT , Group  l)  and  . °l  (measure  #4 , Croup  11).  These 
reliabilities  are  comparable  to  those  found  in  recent  studios  with  the  MTPll 
(5,h),  as  well  as  those  found  with  other  versions  of  the  battery  (of.  1, 
pp.  167-170).  In  subsequent  analyses,  the  data  of  the  Iwc  groups  (Pav  1 
ot  Croup  1 , and  the  1 dav  of  performance  of  Croup  11)  were  combined  and 
treat  oil  as  a single  group. 

Kadi  of  the  IT  measures  of  per fomiAtieo  was  tested  with  analysis  of 
variance  for  differences  across  the  task  combinations  (intervals)  in  which 
it  was  employed.  The  be tween-interval s variance  was  statistically 
significant  (p  >-  .01)  in  each  of  the  IT  cases,  and  it  can  be  inferred  that 
the  performance  of  each  task  was  affected  by  the  combination  of  tasks  with 
which  it  was  performed. 

The  data  of  the  monitoring  tasks  (measures  1 , , and  1 , in  Table  1) 

which  appeared  in  all  intervals  and  which  were  considered  to  act  as  secondary 
tasks  in  the  complex  situations  (intervals  5 through  4)  were  used  in  a 
Thurstone  Case  V scaling  procedure  1°)  to  develop  a scale  of  workload  for  the 
different  tasj^  combinat  ions . Pat  a from  single-task  performance  (l.T  without 
MTK  in  interval'  J.  and  the  converse  in  interval  T)  and  combined  monitoring 


TABLE  2.  Reliability  (Intraclass  Correlation  Coefficients*) 


Computed  Across  MTPB  Task  Combinations** 


Measure* 

Task  (and  Measure) 

Experiment  a 1 

1 (Pay  l) 

Croup 

1 1 

1 

Green  IT  (Response  Time) 

. 51 

. 46 

2 

Red  LT  (Response  Time) 

.42 

.49 

3 

MTR  (Response  Time) 

.21 

. S9 

4 

Math  (Solution  Time ) 

. 73 

.91 

S 

Math  (Accuracy) 

.82 

. 77 

b 

PS  (Time/Prob) 

. 59 

. 69 

7 

PS  (t  Accuracy) 

. 85 

. 5b 

8 

PS  (Confirmation  Time/Prob) 

. b 7 

.61 

9 

PS  (Confirmation  Accuracy) 

.80 

. 52 

10 

PID  (Response  Time) 

.60 

.61 

1 1 

PIP  (Accuracy) 

. S9 

. 70 

12 

TRK  (RMS  Error) 

. 64 

. 73 

* All  24  coefficients  of  correlation  are  statistically 
significant  Ip  • .OS  in  each  case) 


**The  different  task  combinat ions  employed  in  the  9 performance 
intervals  are  shown  in  Table  1 

performance  (intervals  3 and  4)  were  included  with  data  from  the  complex-task 
combinations  of  intervals  3 through  s in  the  scaling  procedure.  The  level  of 
performance  was  assumed  to  be  inverse  to  workload;  i.e.,  the  greater  the 
workload,  the  poorer  the  performance  on  the  secondary,  or  monitoring,  tasks. 
Thus,  the  scaling  was  an  inverse  scale  of  performance  and  a direct  scale  of 
workload — the  higher  the  scale  value,  the  lower  the  performance  and  the  higher 
the  workload  represented  bv  the  task  combination. 

Identical  scaling  procedures  were  applied  to  three  separate  sets  of  data: 
Group  I's  Day  1 and  Day  2 data,  and  Group  II 's  data.  The  scaling  was  accom- 
plished by  comparing  the  performance  of  a given  monitoring  task  (measures  #1 , 

•/f2,  and  ^f3)  under  each  task  combination  (interval),  including  the  single-task 
performances,  with  those  obtained  under  all  other  task  combinations  (intervals). 
In  each  case,  the  proportion  of  subjects  who  performed  better  under  t lie  given 
condition  was  noted,  and  these  proportions  were  then  converted  to  norma  1- 
deviate  ( z ) scores  by  use  of  a table  of  probabilities  associated  with  the 
normal  distribution.  The  normal -deviate  scores  were  then  reflected;  i.e., 
multiplied  by  -1,  and  the  mean  z score  associated  with  each  task  combination 
was  computed.  The  most  negative  mean  thus  represented  the  best  performance 
and  the  presumed  lowest  workload,  so  within  each  measure,  the  largest  negative 
value  was  subtracted  from  each  of  the  means,  thereby  providing  a score  of  zero 
for  the  condition  with  the  best  performance  and  lowest  workload,  and  increasing 
positive  scale  values  for  lesser  performances  and  greater  workloads.  For  each 
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of  the  throe  monitoring  measures,  the  "zero  workload"  condition  was  that  of 
single-task  performance.  A mean  of  the  three  scale  scores  associated  with 
each  interval  (task  combination!  was  computed  to  provide  a single  workload 


scale  value  (WSV)  for  each  interval.  The  results  of  the  scaling  are  reported 
in  the  WSV  rows  of  Table  3. 


TABLE  3. 

Work  load 

Seal  o 

Va  1 ties 

(WSV) 

an  d 

Ad  jus tec 

1 Workload 

1 Scale 

■ Value 

s (AWSV) 

LT/ 

PS/ 

MATH/ 

P I 0/ 

PID/ 

MATH/ 

MTR 

TRK 

PS 

TRK 

PS 

TRK 

Group 

1 : 

WSV 

0.52 

1 . 72 

2.40 

1 . hi 

2.27 

1.71 

( Day 

1) 

AWSV 

0.00 

1 . 20 

1 . 88 

1 . 09 

1 . 75 

1.19 

Group 

I : 

WSV 

0.49 

1.47 

1 .41 

1.31 

1.85 

1.41 

( Day 

2) 

AWSV 

0 . 00 

0.98 

1.42 

0.82 

1 . 36 

0.42 

Group 

1 1 

: WSV 

0.  Sp 

1 .84 

2.  14 

1 . 90 

2.45 

2.03 

AWSV 

0 . 00 

l . 28 

1.48 

1 . 14 

1 .89 

1 .47 

Although  single-task  performance  scores  on  the  LT  monitoring  and  MTR 
monitoring  tasks  were  each  assigned  a WSV  of  zero,  this  should  not  be  inter- 
preted to  mean  that  such  single-task  performance  is  actually  a no-workload 
condition.  On  the  contrary,  the  zero  WSVs  assigned  to  single-task  performance 
represent  an  arbitrary  origin  for  the  scale  values.  The  individual  WSVs 
calculated  for  the  intervals  LT/MTR-l  and  l.T/MTR-2  are  independent  estimates 
of  the  workload  imposed  by  concurrent  performance  of  those  two  monitoring 
tasks.  The  WSVs  associated  with  the  five  complex  task  conditions  represent 
concurrent  performance  of  the  two  active  tasks  indicated  in  each  case,  and 
also  concurrent  performance  of  the  two  monitoring  tasks.  Thus,  the  WSVs 
associated  with  the  complex  performance  conditions  must  be  further  adjusted  it 
they  are  to  represent  only  the  workload  of  the  condition  attributable  to  the 
active  tasks.  Therefore,  the  mean  WSV  of  LT/MTR-l  and  l.T/MTR-2  was 
subtracted  from  the  WSV  associated  with  each  of  the  complex  tasks  to  obtain 
an  adjusted  workload  scale  value  (AWSV)  for  each  complex  performance  condi- 
tion. This  AWSV  reflects  the  workload  of  combinations  of  the  active  tasks 
after  adjustment  for  the  workload  imposed  by  the  monitoring  tasks.  These 
values  are  presented  in  the  AWSV  rows  of  Table  3. 

The  scale  values  obtained  for  the  five  "complex"  task  combinations  were 
highly  correlated  among  the  three  sets:  (a)  .993  for  Day  1 versus  Day  2 for 
Group  I,  (b)  .973  for  Dav  1 Group  1 versus  Group  11,  and  (c)  .947  for  Day  2 
Group  1 versus  Group  II.  The  variabilities  of  the  scores  within  the  three 
sets  of  data  were  essentially  identical,  but  the  means  differed  in  agreement 
with  the  prior  finding  that  Group  1 was  better  on  Day  2 than  on  Day  1.  and 
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better  on  some  measures  on  the  first  day  than  Group  II.  The  high  WSVs 
obtained  for  the  combinations  MATH/PS  and  PID/PS  (see  Table  3)  are  consistent 
with  the  informal  reports  of  subjects  and  the  observations  and  impressions  of 
the  experimenters;  subjects  who  commented  on  tasks  generally  mentioned  these 
as  being,  respectively,  the  most  difficult. 

In  making  a determination  of  workload,  it  would  be  useful  in  some  circum- 
stances if  the  contribution  of  individual  tasks  in  a task  complex  could  be 
evaluated.  Such  values  could  be  used,  for  example,  to  estimate  the  workload 
of  new  combinations  of  tasks.  The  simplest  model  for  analyzing  the  workload 
attributable  to  single  tasks  is  to  assume  additivity;  i.e.,  that  particular 
combinations  of  tasks  do  not  interact  in  unique  ways  and  that  a given  task 
will  impose  the  same  degree  of  workload  regardless  of  its  combination  with 
other  tasks.  This  model  is  consistent  with  an  analysis  of  workload  in  terms 
of  "spare  time."  To  test  this  model,  the  AWSVs  were  analyzed  for  the  workload 
contribution  of  the  individual  active  task,  using  the  assumption  that  the 
workload  imposed  by  these  tasks  was  linearly  additive.  Speci fical ly , task 
workload  scale  values  (TWSVs)  were  derived  by  a modification  of  the  method 
presented  by  Clark  (4)  as  follows:  The  AWSVs  from  each  data  set  were  placed 
in  a symmetric  array,  with  rows  and  columns  representing  each  of  the  active 
tasks.  Then  each  row  of  the  array  was  used  to  generate  a linear  equation, 
assuming  that  each  cell  value  represents  the  sum  of  two  unknown  variables 
associated  with  the  tasks  presented  in  that  combination.  For  example,  the 
first  row  might  contain  the  three  AWSVs  associated  with  complex  tasj<s  that 
included  tracking.  So,  the  equation  associated  with  that  row  would  be: 

3*TRK  + MATH  + PS  + PID  = sum  of  AWSVs  in  first  row.  The  four  simultaneous 
equations  thus  generated  were  then  solved  to  provide  the  TWSVs  presented  in 
Table  4. 


TABLE  4.  Workload  Scale  Values  Associated  With  Active  Tasks  (TWSV) 


TRK 

MATH 

PID 

PS 

Croup  I (Day  1) 

0.2625 

0.9350 

0.8200 

0.9375 

Group  I (Day  2) 

0.2300 

0.6800 

0.6000 

0.7500 

Group  I I 

0.3750 

1.0850 

0.9750 

0.9050 

The  method  employed  to  produce  the  TWSVs  in  Table  4 provides  values  that 
reproduce  the  row  and  column  values  exact’y,  as  well  as  least  squares 
approximations  to  the  cell  values.  These  derived  workload  values  for 
individual  tasks  can  be  combined  by  simple  addition,  under  the  assumption  of 
linear  additivity  to  provide  "predictions"  of  the  workloads  imposed  by  the 
different  combinations  of  tasks.  The  "predicted"  workload  scores  derived  from 
the  TWSVs  are  presented  in  Table  5,  along  with  the  AWSVs  that  they  should 
predict.  It  may  be  noted  that  the  predictions  closely  approximate  the 
original  values  and  thus,  in  this  case,  it  seems  that  the  workload  imposed  by 
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the  active  tasks  may  be  considered  to  be  linearly  additive.  This  is  not  to 
say  that  the  assumption  of  linear  additivity  is  always  likely  to  be  appro- 
priate. In  fact  it  seems  probable  that  workload  would  not  be  additive  for 
some  particular  combinations  of  tasks  (cf.  10).  In  those  cases,  however,  it 
would  be  possible  to  identify  the  nonadditive  combinations  by  comparing  the 
estimated  and  observed  values  and  noting  where  relatively  greater  divergences 
occur. 


TABLE  1 

5.  Actual  Adjusted  Workload 

Scale 

Values 

(AWSV) 

VS  . 

Predicted 

Values 

(Based 

on  TWSV  Sums) 

PS/TRK 

MATH/ 

PS 

PID/ 

TRK 

PID/ 

PS 

MATH/ 

TRK 

RMS  Error 

Group  I 
(Day  1) 

AWSV 

Predicted 

1.20 

1.20 

1.88 

1.87 

1.09 

1.08 

1.75 

1.76 

1.19 

1.20 

.0067 

Group  I 
(Day  2) 

AWSV 

Predicted 

0.98 

0.98 

1.42 

1.43 

0.82 

0.83 

1.36 

1.35 

0.92 

0.91 

.0089 

Group  II 

AWSV 

Predicted 

1.28 

1.28 

1.98 

1.99 

1.34 

1.35 

1.89 

1.88 

1.47 

1.46 

.0089 

The  workload  attributable  to  the  various  active-task  combinations  was 
not  uniform  across  the  three  monitoring  tasks  (although  this  is  not  evident  in 
the  summary  data  presented  here).  Combinations  that  involved  TRK  had  the 
smallest  effect  on  performances  of  all  three  monitoring  tasks.  Combinations 
that  involved  MATH  had  the  largest  negative  effect  on  MTR,  whereas  those  that 
involved  PS  had  their  largest  negative  effect  on  the  LT  (the  sole  exception 
was  with  green  lights.  Group  II).  This  suggests  that  the  outcome  of  the 
scaling  procedure  used  here  and  recommended  as  a method  of  establishing 
indices  of  workload,  is  dependent  not  only  on  the  difficulties  of  the 
primary  tasks  that  are  being  scaled,  but  also  to  some  extent  on  the  nature  of 
the  secondary  tasks  used  for  the  scaling  procedure. 

Since  the  contributions  of  individual  tasks  to  the  workload  of  task 
combinations  could  be  reliably  estimated  with  the  method  employed  here,  it 
seems  safe  to  infer  that  similar  scaling  procedures  could  be  validly  applied 
to  predict  workloads  for  task  combinations  in  other  studies.  Depending  on 
the  combinatorial  nature  of  the  tasks  (e.g. , whether  they  may  be  considered 
additive  or  not),  the  methodology  could  be  applied  to  prediction  of  workload 
where  no  test  data  are  available  for  these  combinations;  provided,  of  course, 
that  appropriate  data  are  available  for  the  individual  tasks  involved  (e.g., 
TWSVs  as  in  Table  4).  The  method  and  its  potential  utility  are  sufficiently 
promising  to  warrant  further  development  and  study. 
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IV. 


Summ;l  r V . 


A si-ale  of  workload  was  derived  for  five  complex  t ask-comb  inal  ion 
conditions.  Tlio  scale  provided  reliable  values  that  were  stable  oil  replica- 
tion. Where  the  assumption  can  be  made  that  the  tasks  combine  in  a linearly 
additive  manner,  with  substantially  no  task-combination  interactions,  the 
scale  can  be  us«'d  to  estimate  bot  li  (a)  tin'  relative  workload  contribution  ot 
each  of  the  tasks  performed  in  the  several  task  combinations,  and  (b)  the 
resultant  workloads  of  combinations  involving  those  tasks,  including  possibly 
combinations  other  than  those  from  which  the  data  were  derived.  The 
methodology  should  be  applicable  to  other  measures  as  well,  e.g.,  to 
biomedical  indices  of  stress  or  to  subjective  ratings.  The  major  restriction 
ti'  the  method's  use  is  the  requirement  that  ‘>0  or  more  subjects  be  employed 
in  order  to  viold  stable  scale  values  (cf.  l> ) . The  availability  and  use  of  a 
valid  index  of  workload  would  result  in  gains  in  both  safety  and 
productivity  by  providing  clearer  specifications  ot  the  demands  that  are 
placed  on  operators  under  different  condit ions.  The  present  technique  has 
provided  valid  indices  in  this  laboratory  study.  Should  it  prove  to  be 
reliable  and  valid  in  operational  situations,  its  use  to  provide  workload 
specifications  should  be  quite  beneficial  to  the  design  of  both  systems  and 
operat ing  procedures. 
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